In this assignment, you will perform fundamental analysis for the San Francisco housing market to allow potential real estate investors to choose rental investment properties.
# imports
import panel as pn
pn.extension('plotly')
import plotly.express as px
import pandas as pd
import hvplot.pandas
import matplotlib.pyplot as plt
import numpy as np
import os
from pathlib import Path
from dotenv import load_dotenv
import warnings
warnings.filterwarnings('ignore')
# Read the Mapbox API key
load_dotenv()
map_box_api = os.getenv("mapbox")
# Read the census data into a Pandas DataFrame
file_path = Path("Data/sfo_neighborhoods_census_data.csv")
sfo_data = pd.read_csv(file_path, index_col="year")
sfo_data.head()
| neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|
| year | ||||
| 2010 | Alamo Square | 291.182945 | 372560 | 1239 |
| 2010 | Anza Vista | 267.932583 | 372560 | 1239 |
| 2010 | Bayview | 170.098665 | 372560 | 1239 |
| 2010 | Buena Vista Park | 347.394919 | 372560 | 1239 |
| 2010 | Central Richmond | 319.027623 | 372560 | 1239 |
In this section, you will calculate the number of housing units per year and visualize the results as a bar chart using the Pandas plot function.
Hint: Use the Pandas groupby function.
Optional challenge: Use the min, max, and std to scale the y limits of the chart.
# Calculate the mean number of housing units per year (hint: use groupby)
data_group_per_year = sfo_data["housing_units"].groupby(sfo_data.index).mean()
data_group_per_year
year 2010 372560 2011 374507 2012 376454 2013 378401 2014 380348 2015 382295 2016 384242 Name: housing_units, dtype: int64
# dont know how to do this
# Save the dataframe as a csv file
data_group_per_year.to_csv("housing_units_per_year.csv",header=False)
# Use the Pandas plot function to plot the average housing units per year.
fig_housing_units = plt.figure()
plot_housing_units = data_group_per_year.plot.bar()
plot_housing_units.set_xlabel('Year', fontsize=12)
plot_housing_units.set_ylabel('Housing Units', fontsize=12)
plot_housing_units.set_title("Housing Units in San Francisco from 2010 to 2016", fontsize=15, fontweight='bold')
housing_units_std =data_group_per_year.std()
housing_units_min =min(data_group_per_year)
housing_units_max =max(data_group_per_year)
plot_housing_units.set_ylim(housing_units_min -housing_units_std, housing_units_max + housing_units_std)
plt.show()
plt.close(fig_housing_units)
In this section, you will calculate the average monthly rent and the average price per square foot for each year. An investor may wish to better understand the sales price of the rental property over time. For example, a customer will want to know if they should expect an increase or decrease in the property value over time so they can determine how long to hold the rental property. Plot the results as two line charts.
Optional challenge: Plot each line chart in a different color.
# Calculate the average sale price per square foot and average gross rent
Average_of_housing_Cost_per_year = sfo_data[["sale_price_sqr_foot","gross_rent"]].groupby(sfo_data.index).mean()
Average_of_housing_Cost_per_year
| sale_price_sqr_foot | gross_rent | |
|---|---|---|
| year | ||
| 2010 | 369.344353 | 1239 |
| 2011 | 341.903429 | 1530 |
| 2012 | 399.389968 | 2324 |
| 2013 | 483.600304 | 2971 |
| 2014 | 556.277273 | 3528 |
| 2015 | 632.540352 | 3739 |
| 2016 | 697.643709 | 4390 |
# Save the dataframe as a csv file
Average_of_housing_Cost_per_year.reset_index(inplace=True)
Average_of_housing_Cost_per_year.to_csv("Average_of_housing_Cost_per_year.csv",header=True,index=False)
Average_of_housing_Cost_per_year
| year | sale_price_sqr_foot | gross_rent | |
|---|---|---|---|
| 0 | 2010 | 369.344353 | 1239 |
| 1 | 2011 | 341.903429 | 1530 |
| 2 | 2012 | 399.389968 | 2324 |
| 3 | 2013 | 483.600304 | 2971 |
| 4 | 2014 | 556.277273 | 3528 |
| 5 | 2015 | 632.540352 | 3739 |
| 6 | 2016 | 697.643709 | 4390 |
# Create two line charts, one to plot the average sale price per square foot and another for average montly rent
# Line chart for average sale price per square foot
fig_Average_of_housing_Cost_per_year = plt.figure()
plot_Average_of_housing_Cost_per_year = Average_of_housing_Cost_per_year['sale_price_sqr_foot'].plot(color = 'purple')
plot_Average_of_housing_Cost_per_year.set_xlabel('Year', fontsize=12)
plot_Average_of_housing_Cost_per_year.set_ylabel('Price per SqFt', fontsize=12)
plot_Average_of_housing_Cost_per_year.set_title("Average Sale Price per sqrfoot by Year, San Francisco", fontsize=15, fontweight='bold');
# Line chart for average montly rent
fig_Average_of_housing_Cost_per_year = plt.figure()
plot_Average_of_housing_Cost_per_year = Average_of_housing_Cost_per_year['gross_rent'].plot(color = 'red')
plot_Average_of_housing_Cost_per_year.set_xlabel('Year', fontsize=12)
plot_Average_of_housing_Cost_per_year.set_ylabel('Gross Rent', fontsize=12)
plot_Average_of_housing_Cost_per_year.set_title("Average Gross Rent by Year, San Francisco", fontsize=15, fontweight='bold');
In this section, you will use hvplot to create two interactive visulizations of average prices with a dropdown selector for the neighborhood. The first visualization will be a line plot showing the trend of average price per square foot over time for each neighborhood. The second will be a line plot showing the trend of average montly rent over time for each neighborhood.
Hint: It will be easier to create a new DataFrame from grouping the data and calculating the mean prices for each year and neighborhood
# Group by year and neighborhood and then create a new dataframe of the mean values
df_costs = sfo_data.groupby(["year","neighborhood"]).mean()
df_costs.reset_index(inplace=True)
df_costs.rename(columns={"level_0":"year"}, inplace=True)
df_costs.head(10)
| year | neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|
| 0 | 2010 | Alamo Square | 291.182945 | 372560 | 1239 |
| 1 | 2010 | Anza Vista | 267.932583 | 372560 | 1239 |
| 2 | 2010 | Bayview | 170.098665 | 372560 | 1239 |
| 3 | 2010 | Buena Vista Park | 347.394919 | 372560 | 1239 |
| 4 | 2010 | Central Richmond | 319.027623 | 372560 | 1239 |
| 5 | 2010 | Central Sunset | 418.172493 | 372560 | 1239 |
| 6 | 2010 | Corona Heights | 369.359338 | 372560 | 1239 |
| 7 | 2010 | Cow Hollow | 569.379968 | 372560 | 1239 |
| 8 | 2010 | Croker Amazon | 165.645730 | 372560 | 1239 |
| 9 | 2010 | Diamond Heights | 456.930822 | 372560 | 1239 |
# Save the dataframe as a csv file
df_costs.to_csv("costs.csv",index=False)
# Use hvplot to create an interactive line chart of the average price per sq ft.
# The plot should have a dropdown selector for the neighborhood
df_costs.hvplot.line(x='year', y=['sale_price_sqr_foot'],
groupby=(['neighborhood']),
rot=90).opts(yformatter="%.0f",
title="Average Price per sq ft.",
ylabel = 'Average sale_price_sqr_foot',
xlabel='Year')
# Use hvplot to create an interactive line chart of the average monthly rent.
# The plot should have a dropdown selector for the neighborhood
df_costs.hvplot.line(x='year', y=['gross_rent'],
groupby=(['neighborhood']),
rot=90).opts(yformatter="%.0f",
title="Average gross_rent.",
ylabel = 'Average gross_rent',
xlabel='Year')
In this section, you will need to calculate the mean sale price per square foot for each neighborhood and then sort the values to obtain the top 10 most expensive neighborhoods on average. Plot the results as a bar chart.
# Getting the data from the top 10 expensive neighborhoods to own
expensive_neighborhood_data = (
df_costs[["neighborhood", "sale_price_sqr_foot", "housing_units", "gross_rent"]]
.groupby("neighborhood")
.mean().sort_values("sale_price_sqr_foot", ascending=False))
top_ten=expensive_neighborhood_data.head(10)
top_ten.reset_index(inplace=True)
top_ten
| neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|
| 0 | Union Square District | 903.993258 | 377427.50 | 2555.166667 |
| 1 | Merced Heights | 788.844818 | 380348.00 | 3414.000000 |
| 2 | Miraloma Park | 779.810842 | 375967.25 | 2155.250000 |
| 3 | Pacific Heights | 689.555817 | 378401.00 | 2817.285714 |
| 4 | Westwood Park | 687.087575 | 382295.00 | 3959.000000 |
| 5 | Telegraph Hill | 676.506578 | 378401.00 | 2817.285714 |
| 6 | Presidio Heights | 675.350212 | 378401.00 | 2817.285714 |
| 7 | Cow Hollow | 665.964042 | 378401.00 | 2817.285714 |
| 8 | Potrero Hill | 662.013613 | 378401.00 | 2817.285714 |
| 9 | South Beach | 650.124479 | 375805.00 | 2099.000000 |
# Save the dataframe as a csv file
top_ten.to_csv("top_ten.csv",index=False)
# Plotting the data from the top 10 expensive neighborhoods
#top_10_exp_neighborhood = top_ten.nlargest(10,['sale_price_sqr_foot'])
top_ten.hvplot.bar(x='neighborhood',y='sale_price_sqr_foot',ylim=(500.0000 , 950.0000),
rot=90).opts(yformatter="%.0f",
title="The top 10 Most Expensive Neighborhoodsin San Francisco from 2010 to 2016")
In this section, you will use hvplot to create an interactive visualization with a dropdown selector for the neighborhood. This visualization will feature a side-by-side comparison of average price per square foot versus average montly rent by year.
Hint: Use the hvplot parameter, groupby, to create a dropdown selector for the neighborhood.
# Fetch the previously generated DataFrame that was grouped by year and neighborhood
df_costs.head(10)
| year | neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|
| 0 | 2010 | Alamo Square | 291.182945 | 372560 | 1239 |
| 1 | 2010 | Anza Vista | 267.932583 | 372560 | 1239 |
| 2 | 2010 | Bayview | 170.098665 | 372560 | 1239 |
| 3 | 2010 | Buena Vista Park | 347.394919 | 372560 | 1239 |
| 4 | 2010 | Central Richmond | 319.027623 | 372560 | 1239 |
| 5 | 2010 | Central Sunset | 418.172493 | 372560 | 1239 |
| 6 | 2010 | Corona Heights | 369.359338 | 372560 | 1239 |
| 7 | 2010 | Cow Hollow | 569.379968 | 372560 | 1239 |
| 8 | 2010 | Croker Amazon | 165.645730 | 372560 | 1239 |
| 9 | 2010 | Diamond Heights | 456.930822 | 372560 | 1239 |
# Plotting the data from the top 10 expensive neighborhood
expensive_neighborhood_year = (sfo_data.groupby("neighborhood").mean().sort_values("sale_price_sqr_foot", ascending=False))
top_ten_year=expensive_neighborhood_data.head(10)
top_ten_year.reset_index(inplace=True)
top_ten_year
| neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|
| 0 | Union Square District | 903.993258 | 377427.50 | 2555.166667 |
| 1 | Merced Heights | 788.844818 | 380348.00 | 3414.000000 |
| 2 | Miraloma Park | 779.810842 | 375967.25 | 2155.250000 |
| 3 | Pacific Heights | 689.555817 | 378401.00 | 2817.285714 |
| 4 | Westwood Park | 687.087575 | 382295.00 | 3959.000000 |
| 5 | Telegraph Hill | 676.506578 | 378401.00 | 2817.285714 |
| 6 | Presidio Heights | 675.350212 | 378401.00 | 2817.285714 |
| 7 | Cow Hollow | 665.964042 | 378401.00 | 2817.285714 |
| 8 | Potrero Hill | 662.013613 | 378401.00 | 2817.285714 |
| 9 | South Beach | 650.124479 | 375805.00 | 2099.000000 |
df_costs.hvplot.bar('year', y=['gross_rent','sale_price_sqr_foot'],
groupby=(['neighborhood']),
rot=90).opts(yformatter="%.0f",
title="Top 10 Expensive Neighborhood in San Francisco from 2010 to 2016",
ylabel = 'Housing Units',
xlabel='Neighborhood')
In this section, you will read in neighborhoods location data and build an interactive map with the average house value per neighborhood. Use a scatter_mapbox from Plotly express to create the visualization. Remember, you will need your Mapbox API key for this.
# Load neighborhoods coordinates data
file_path2 = Path("Data/neighborhoods_coordinates.csv")
coord_data = pd.read_csv(file_path2)
coord_data.head()
| Neighborhood | Lat | Lon | |
|---|---|---|---|
| 0 | Alamo Square | 37.791012 | -122.402100 |
| 1 | Anza Vista | 37.779598 | -122.443451 |
| 2 | Bayview | 37.734670 | -122.401060 |
| 3 | Bayview Heights | 37.728740 | -122.410980 |
| 4 | Bernal Heights | 37.728630 | -122.443050 |
You will need to join the location data with the mean values per neighborhood.
Calculate the mean values for each neighborhood.
Join the average values with the neighborhood locations.
# Calculate the mean values for each neighborhood
neighborhoods_mean = df_costs.groupby(by=["neighborhood"]).mean().drop(["year"], axis=1)
neighborhoods_mean.reset_index(inplace=True)
neighborhoods_mean.head()
| neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|
| 0 | Alamo Square | 366.020712 | 378401.0 | 2817.285714 |
| 1 | Anza Vista | 373.382198 | 379050.0 | 3031.833333 |
| 2 | Bayview | 204.588623 | 376454.0 | 2318.400000 |
| 3 | Bayview Heights | 590.792839 | 382295.0 | 3739.000000 |
| 4 | Bernal Heights | 576.746488 | 379374.5 | 3080.333333 |
# Join the average values with the neighborhood locations
combined_df = pd.concat([coord_data , neighborhoods_mean], axis="columns", join="inner")
combined_df.head()
| Neighborhood | Lat | Lon | neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|---|---|
| 0 | Alamo Square | 37.791012 | -122.402100 | Alamo Square | 366.020712 | 378401.0 | 2817.285714 |
| 1 | Anza Vista | 37.779598 | -122.443451 | Anza Vista | 373.382198 | 379050.0 | 3031.833333 |
| 2 | Bayview | 37.734670 | -122.401060 | Bayview | 204.588623 | 376454.0 | 2318.400000 |
| 3 | Bayview Heights | 37.728740 | -122.410980 | Bayview Heights | 590.792839 | 382295.0 | 3739.000000 |
| 4 | Bernal Heights | 37.728630 | -122.443050 | Bernal Heights | 576.746488 | 379374.5 | 3080.333333 |
combined_df.to_csv("combined_df.csv", index=False)
Plot the average values per neighborhood using a Plotly express scatter_mapbox visualization.
# Set the mapbox access token
map_box = px.scatter_mapbox(
combined_df,
lat="Lat",
lon="Lon",
size="sale_price_sqr_foot",
color="gross_rent",
zoom=4,
title= "Average Values per Neighborhood in San Francisco"
)
map_box.update_layout(
mapbox_style="carto-positron")
map_box.show() #display map
# Create a scatter mapbox to analyze neighborhood info
# YOUR CODE HERE!
In this section, you will use Plotly express to create visualizations that investors can use to interactively filter and explore various factors related to the house value of the San Francisco's neighborhoods.
# Fetch the data from all expensive neighborhoods per year.
most_expensive=df_costs[df_costs["neighborhood"].isin(top_ten["neighborhood"])]
most_expensive.head()
| year | neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|
| 7 | 2010 | Cow Hollow | 569.379968 | 372560 | 1239 |
| 31 | 2010 | Miraloma Park | 680.608729 | 372560 | 1239 |
| 41 | 2010 | Pacific Heights | 496.516014 | 372560 | 1239 |
| 46 | 2010 | Potrero Hill | 491.450004 | 372560 | 1239 |
| 47 | 2010 | Presidio Heights | 549.417931 | 372560 | 1239 |
# Save the dataframe as a csv file
most_expensive.to_csv("most_expensive.csv",index=False)
# Parallel Categories Plot
px.parallel_categories(
top_ten,
dimensions=["neighborhood", "housing_units", "gross_rent"],
color="sale_price_sqr_foot",
color_continuous_scale=px.colors.sequential.Inferno,
title="Parallel Categories Plot of Most Expensive Neighborhoods in San Francisco per year",
labels={
"type": "Neighborhood",
"region": "Housing Units",
"prop_size": "Gross Rent",
},
)
# Parallel Coordinates Plot
px.parallel_coordinates(top_ten, color='sale_price_sqr_foot',
title="Parallel Categories Plot of Most Expensive Neighborhoods in San Francisco per year")
# Sunburst Plot
#df = px.most_expensive.tips()
fig = px.sunburst(most_expensive, path=['year', 'neighborhood'],
values='gross_rent', color='gross_rent',
color_continuous_scale=px.colors.sequential.Blues,
height=600,
title="Sunburst Chart of Most Expensive Neighborhoods in San Francisco per year")
fig.show()